Content-aware data distribution over cluster nodes

نویسندگان

چکیده

Proper data items distribution may seriously improve the performance of processing in distributed environment. However, typical datastorage systems as well computational frameworks do not pay special attention to that aspect. In this paper author introduces two custom addressing methods for on example Scalable Distributed Two-Layer Datastore. The basic idea those is preserve stored same cluster node are similar each other following concepts clustering. Still, most clustering mechanisms have serious problem with scalability which a severe limitation Big Data applications. proposed allow efficiently distribute set over buckets. As it was shown by experimental results, all generate good results comparison traditional techniques like k-means, agglomerative and birch environment experiments proper can effectiveness processing.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Locality-Aware Content Distribution

We present the design and deployment of the Julia locality aware content distribution algorithm. Our novel contributions are locality aware node selection, forming a dynamically changing topology and division of the file into varying length chunks based on locality of the transfer. We present a large scale WAN deployment on over than 250 PlanetLab machines. We show that our technique can improv...

متن کامل

Content-Aware Master Data Management

Master data management (MDM) provides a means to link data from various structured data sources and to generate a consolidated master record for entities such as customers or products. However, a large amount of valuable information about entities exists as unstructured content in documents. In this paper, we show how MDM can be made aware of information from unstructured content by automatical...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

modeling loss data by phase-type distribution

بیمه گران همیشه بابت خسارات بیمه نامه های تحت پوشش خود نگران بوده و روش هایی را جستجو می کنند که بتوانند داده های خسارات گذشته را با هدف اتخاذ یک تصمیم بهینه مدل بندی نمایند. در این پژوهش توزیع های فیزتایپ در مدل بندی داده های خسارات معرفی شده که شامل استنباط آماری مربوطه و استفاده از الگوریتم em در برآورد پارامترهای توزیع است. در پایان امکان استفاده از این توزیع در مدل بندی داده های گروه بندی ...

Time-Aware Content Summarization of Data Streams

Major media companies such as The Financial Times, the Wall Street Journal or Reuters generate huge amounts of textual news data world wide on a daily basis. Finance specialists rely on this information to grasp the market sentiment and make decisions accordingly (e.g., buy or sell stocks). An important application for them is to mine this mass of information for extracting recurrent behaviors ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Intelligent Data Analysis

سال: 2021

ISSN: ['1088-467X', '1571-4128']

DOI: https://doi.org/10.3233/ida-205360